cDC2 timecourse from total culture

Recap

Prior 96w evaluation of TurboCapture-Seq v2 showed low UMIs recovered and high seq saturation. I did low throughput troubleshooting and didn’t see any issues. Process this experiment myself taking care to remove residual liquids from wash steps.

Process Hui Shi of Naik lab Bcor + Flt3 timecourse. Includes a few of my samples.

Samples

  • Sorted dendritic cells
  • HEK293T cell lysates in 1x TCL @ cells/uL
  • PBMC cell lysates in 1x TCL @ 500 cells/uL
  • No template control 1x TCL

Notebook recap

SCE object in generate in 1A_generateSCE_reads notebook. The samples were then clustered in the 2B Clustering Wt notebook

Notebook aim

Compare different subsets sorted from dendritic cells at 5 and 7 days. The easiest contrasts to interpret are the difference from the common dendritic cell precursor (CDP).

Workflow is from:
https://bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html#differential-expression-analysis

Read SCE and preprocessing

This was generated in notebook 2B.

sce <- readRDS(here::here(
   "data/TIRE_dendritic_mouse/SCEs", "DCs_cluster.sce.rds"))

dge <- scran::convertTo(sce, type="edgeR")
dge_orig <- dge

Have a look at the important metadata.
There are not enough replicates for the preDC contrast to be meaningful.

tb <- dge$samples[,c("Cell_type", "Cell_number", "Ligand", "Timepoint_Day")]

tb %>% 
  dplyr::count(Cell_type, Ligand, Timepoint_Day)

Recap the PCA

The cell type is the major difference between samples with the timepoint being much less so.
Write the day as text.

pca_tb <- as_tibble(reducedDim(sce))
pca_tb$Cell_Type <- sce$Cell_type
pca_tb$Timepoint <- sce$Timepoint_Day
pca_tb$Timepoint <- as.factor(pca_tb$Timepoint)

plt2 <- ggplot(pca_tb, aes(x=PC1, y=PC2, colour=Cell_Type, label=Timepoint)) +
  geom_text(size=5) +
  xlab("PC1 (34%)") + ylab("PC2 (28%)") +
  scale_colour_brewer(palette = "Dark2") +
  theme_Publication()
  
plt2

Filter samples and low expressed genes

Interested in only the total DC cultures and cDC2s. Remove the other cell types.

keep_samples <- c("Total_DC_culture", "cDC2")
dge <- dge[,dge$samples$Cell_type %in% keep_samples]

Doing this reduces the multiple testing burden and fits variation better. After half the genes are removed leaving 10,000.

dim(dge)
## [1] 21484    15
keep.exprs <- filterByExpr(dge, group=dge$samples$Cell_type, min.count=1)
dge <- dge[keep.exprs,, keep.lib.sizes=FALSE]
dim(dge)
## [1] 9619   15

Correct for composition biases by computing normalization factors with the trimmed mean of M-values method.

dge <- calcNormFactors(dge)

Look at timecourse of differntiation from total DC culture to cDC2

Looking at the PCA and deciding on a subset to focus on cDC2 seems like a good choice:

  • Is a mature subset unlike CDP and therefore no further on trajectory to go
  • Has some variation between day 5 and day 7

Day 5:
Expansion of DC progenitors and early differentiation into immature DCs. Presence of a heterogeneous mix of progenitor cells and early DC subsets. Limited functional capacity.

Day 7:
Peak numbers of mature DCs with distinct cDC1, cDC2, and pDC populations. Enhanced antigen presentation and cytokine production abilities. Optimal time point for harvesting DCs for functional assays.

Set up a spline based analysis

Fit spline

Use a cubic regression spline curve with 3 degrees of freedom.

dge$samples$Timepoint_Day <- as.numeric(as.character(dge$samples$Timepoint_Day))
splines <- ns(dge$samples$Timepoint_Day, df = 3)

design <- model.matrix(~splines, dge$samples)
head(design)
##            (Intercept)    splines1  splines2   splines3
## ACAGTAGCTC           1  0.00000000 0.0000000  0.0000000
## AGACGCATCG           1 -0.04015528 0.3459654 -0.2631210
## ATCAGCGCGA           1 -0.11108086 0.4651511  0.6459298
## CCGTATGTAG           1  0.00000000 0.0000000  0.0000000
## CGTCAATAGT           1  0.00000000 0.0000000  0.0000000
## CGTCACTCTA           1 -0.11108086 0.4651511  0.6459298

The three coefficients do not have any particular meaning. Hypothesis testing would only make sense if the three coefficients are assessed together. The advantage of using a cubic spline curve is that it provides more stable fit at the end points compared to a polynomial.

The spline curve with 3 degrees of freedonm has 2 knots where cubic polynomials are splined together. In general, choosing a number of degrees of freedom to be in range of 3-5 is reasonable.

Fit BCV

Circle back to see why BCV has no dots.

par(mfrow=c(1,2))

dge <- estimateDisp(dge, design)
summary(dge$trended.dispersion)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.001107 0.013019 0.014557 0.013520 0.015816 0.021353
plotBCV(dge)

fit <- glmQLFit(dge, design, robust=TRUE)
summary(fit$var.prior)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9699  0.9916  1.0414  1.0314  1.0581  1.7072
plotQLDisp(fit)

Timecourse analysis

In a time course experiment, we are looking for genes that change expression level over time. Here, the design matrix uses 3 natural spline basis vectors to model smooth changes over time, without assuming any particular pattern to the trend.
We test for a trend by conducting F-tests on 3 df for each gene:

fit <- glmQLFTest(fit, coef=2:4)

The topTags function lists the top set of genes with most significant time effects.

results <- as_tibble(as.data.frame(
  topTags(fit, n=length(fit$genes$Symbol))
  ))
results

The total number of genes with significant (5% FDR) changes at different doses can be examined with decideTests.

There are 3200 differentially expressed genes that change over the 7 days.

summary(decideTests(fit))
##        splines3-splines2-splines1
## NotSig                       6426
## Sig                          3193

Note that all three spline coefficients should be tested together in this way. It is not meaningful to replace the F-tests with t-tests for the individual coefficients, and similarly the logFC columns of the top table do not have any interpretable meaning.
The trends should instead be interpreted by way of trend plots, as we show now.
Finally, we visualize the fitted spline curves for the top four genes. We start by computing the observed and fitted log-CPM values for each gene:

top_genes <- head(results$Symbol,4)

logCPM.obs <- as.data.frame(cpm(dge, log=TRUE, prior.count=fit$prior.count))
logCPM.obs$ID <- row.names(logCPM.obs)
logCPM.fit <- as.data.frame(cpm(fit, log=TRUE))
logCPM.fit$ID <- row.names(logCPM.fit)

# Mung the cpm columns into a single tibble
lcm.obs <- pivot_longer(logCPM.obs[top_genes,], cols = -ID, names_to = "sample", values_to = "count") %>% 
  rename(obs_cpm = count)
lcm.fit <- pivot_longer(logCPM.fit[top_genes,], cols = -ID, names_to = "sample", values_to = "count") %>% 
  rename(fit_cpm = count)

lcm.obs$fit_cpm <- lcm.fit$fit_cpm

# Add back the sample metadata
lcm <- left_join(lcm.obs, results[,c("ID", "Symbol")])
lcm <- left_join(lcm, dge$samples[,c("Well_BC", "Timepoint_Day")], by=c("sample" = "Well_BC"))

Log scale visualisation

The hits are all MHCII genes unsurprisingly.

The CD74 gene encodes a protein known as the HLA class II histocompatibility antigen gamma chain, also referred to as the invariant chain (Ii) or CD74. This protein is crucial for the proper functioning of the immune system, particularly in antigen presentation by Major Histocompatibility Complex (MHC) class II molecules.

Any gene that begins with H2- in mice is part of the H-2 complex, the murine equivalent of the human MHC (called HLA). These genes encode proteins involved in the immune system’s ability to recognize self from non-self.

Roles of H2- Genes:

  • MHC Class I Genes (e.g., H2-K, H2-D): Present endogenous peptides (from within the cell) to CD8+ T cells.
  • MHC Class II Genes (e.g., H2-Aa, H2-Ab1): Present exogenous peptides (from outside the cell) to CD4+ T cells.
ggplot(lcm) +
  geom_point(aes(y = obs_cpm, x=Timepoint_Day), color = "black") +
  geom_smooth(aes(y = fit_cpm, x=Timepoint_Day), method = "loess", se = FALSE, color = "red") +
  xlab("Time (days)") + ylab("logCPM") +
  theme_Publication(base_size = 22) + theme(axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 1)) +
  facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
  theme(strip.text = element_text(face = "bold.italic"))

Conclusion

The results make sense to me. Will confirm with the domain experts.

Session info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Red Hat Enterprise Linux 9.4 (Plow)
## 
## Matrix products: default
## BLAS:   /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRblas.so 
## LAPACK: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRlapack.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
##  [1] grid      splines   stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] ggthemes_5.1.0              here_1.0.1                 
##  [3] org.Mm.eg.db_3.19.1         AnnotationDbi_1.66.0       
##  [5] ggvenn_0.1.10               knitr_1.48                 
##  [7] patchwork_1.3.0             edgeR_4.2.2                
##  [9] limma_3.60.6                scater_1.32.1              
## [11] scran_1.32.0                scuttle_1.14.0             
## [13] lubridate_1.9.3             forcats_1.0.0              
## [15] stringr_1.5.1               dplyr_1.1.4                
## [17] purrr_1.0.2                 readr_2.1.5                
## [19] tidyr_1.3.1                 tibble_3.2.1               
## [21] ggplot2_3.5.1               tidyverse_2.0.0            
## [23] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [25] Biobase_2.64.0              GenomicRanges_1.56.2       
## [27] GenomeInfoDb_1.40.1         IRanges_2.38.1             
## [29] S4Vectors_0.42.1            BiocGenerics_0.50.0        
## [31] MatrixGenerics_1.16.0       matrixStats_1.4.1          
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.3                 gridExtra_2.3            
##  [3] rlang_1.1.4               magrittr_2.0.3           
##  [5] RSQLite_2.3.7             compiler_4.4.1           
##  [7] mgcv_1.9-1                DelayedMatrixStats_1.26.0
##  [9] png_0.1-8                 vctrs_0.6.5              
## [11] pkgconfig_2.0.3           crayon_1.5.3             
## [13] fastmap_1.2.0             XVector_0.44.0           
## [15] labeling_0.4.3            utf8_1.2.4               
## [17] rmarkdown_2.28            tzdb_0.4.0               
## [19] UCSC.utils_1.0.0          ggbeeswarm_0.7.2         
## [21] bit_4.5.0                 xfun_0.48                
## [23] bluster_1.14.0            zlibbioc_1.50.0          
## [25] cachem_1.1.0              beachmat_2.20.0          
## [27] jsonlite_1.8.9            blob_1.2.4               
## [29] highr_0.11                DelayedArray_0.30.1      
## [31] BiocParallel_1.38.0       irlba_2.3.5.1            
## [33] parallel_4.4.1            cluster_2.1.6            
## [35] R6_2.5.1                  RColorBrewer_1.1-3       
## [37] bslib_0.8.0               stringi_1.8.4            
## [39] jquerylib_0.1.4           Rcpp_1.0.13              
## [41] Matrix_1.7-0              igraph_2.0.3             
## [43] timechange_0.3.0          tidyselect_1.2.1         
## [45] rstudioapi_0.17.0         abind_1.4-8              
## [47] yaml_2.3.10               viridis_0.6.5            
## [49] codetools_0.2-20          lattice_0.22-6           
## [51] KEGGREST_1.44.1           withr_3.0.1              
## [53] evaluate_1.0.1            Biostrings_2.72.1        
## [55] pillar_1.9.0              generics_0.1.3           
## [57] rprojroot_2.0.4           hms_1.1.3                
## [59] sparseMatrixStats_1.16.0  munsell_0.5.1            
## [61] scales_1.3.0              glue_1.8.0               
## [63] metapod_1.12.0            tools_4.4.1              
## [65] BiocNeighbors_1.22.0      ScaledMatrix_1.12.0      
## [67] locfit_1.5-9.10           colorspace_2.1-1         
## [69] nlme_3.1-164              GenomeInfoDbData_1.2.12  
## [71] beeswarm_0.4.0            BiocSingular_1.20.0      
## [73] vipor_0.4.7               cli_3.6.3                
## [75] rsvd_1.0.5                fansi_1.0.6              
## [77] S4Arrays_1.4.1            viridisLite_0.4.2        
## [79] gtable_0.3.5              sass_0.4.9               
## [81] digest_0.6.37             SparseArray_1.4.8        
## [83] ggrepel_0.9.6             dqrng_0.4.1              
## [85] farver_2.1.2              memoise_2.0.1            
## [87] htmltools_0.5.8.1         lifecycle_1.0.4          
## [89] httr_1.4.7                statmod_1.5.0            
## [91] bit64_4.5.2